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Abstract 

Spoken language processing requires speech and nat- 
ural language integration. Moreover, spoken Korean 
calls for unique processing methodology due to its lin- 
guistic characteristics. This paper presents SKOPE, a 
connectionist/symbolic spoken Korean processing en- 
gine, which emphasizes that: 1) connectionist and sym- 
bolic techniques must be selectively applied accord- 
ing to their relative strength and weakness, and 2) 
the linguistic characteristics of Korean must be fully 
considered for phoneme recognition, speech and lan- 
guage integration, and morphological/syntactic pro- 
cessing. The design and implementation of SKOPE 
demonstrates how connectionist/symbolic hybrid ar- 
chitectures can be constructed for spoken agglutina- 
tive language processing. Also SKOPE presents many 
novel ideas for speech and language processing. The 
phoneme recognition, morphological analysis, and syn- 
tactic analysis experiments show that SKOPE is a vi- 
able approach for the spoken Korean processing. 

Introduction 

Spoken language processing challenges for integration 
of speech recognition into natural language processing, 
and must deal with multi-level knowledge sources from 
signal level to symbol level. The multi-level knowledge 
integration and handling increase the technical diffi- 
culty of both the speech and the natural language pro- 
cessing. In the speech recognition side, the recognition 
must be at phoneme-level for large vocabulary contin- 
uous speech, and the speech recognition module must 
provide right level of outputs to the natural language 
module in the form of not single solution but many 
alternatives of solution hypotheses. The n-best list 
(?), word-graph (?), and word-lattice (?) techniques 
are mostly used in this purpose. The speech recogni- 
tion module can also ask the linguistic scores from the 
language processing module in a more tightly coupled 
bottom- up/top-down hybrid integration scheme (?). In 
the natural language side, the insertion, deletion, and 
substitution errors of continuous speech must be com- 
pensated by robust parsing and partial parsing tech- 
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niques, e.g. (?). Often the spoken languages are un- 
grammatical, fragmentary, and contain non-fluencies 
and speech repairs, and must be processed incremen- 
tally under the time constraints (?). 

Most of the speech and natural language systems 
which were developed for English and other Indo- 
European languages neglect the morphological process- 
ing, and integrate speech and natural language at the 
word level (?; ?). Often these systems employ a pro- 
nunciation dictionary for speech recognition and inde- 
pendent dictionaries for natural language processing. 
However, for the agglutinative languages such as Ko- 
rean and Japanese, the morphological processing plays 
a major role in the language processing since these lan- 
guages have very complex morphological phenomena 
and relatively simple syntactic functionality. Unfortu- 
nately even the Japanese researchers apply degenerated 
morphological techniques for the spoken Japanese pro- 
cessing (?; ?). Obviously degenerated morphological 
processing limits the usable vocabulary size for the sys- 
tem, and word-level dictionary results in exponential 
explosion in the number of dictionary entries. For the 
agglutinative languages, we need sub-word level inte- 
gration which leaves rooms for general morphological 
processing. 

The spoken language processing calls for multi- 
strategic approaches in order to deal with signal level 
as well as symbol level information in a symbiotic 
and unified way. Recent development of connection- 
ist speech recognition (?) and connectionist natural 
language processing (?) shed lights on the connec- 
tionist / symbolic hybrid models of spoken language pro- 
cessing, and some of the researches are already avail- 
able for English and other Indo-European languages (?; 
?). We feel that it is the right time to develop con- 
nectionist/symbolic hybrid spoken languages process- 
ing systems for the agglutinative languages such as Ko- 
rean and Japanese. 

This paper presents one of the such endeavors, 
SKOPE (Spoken Korean Processing Engine), that has 
the following unique features: 1) The connectionist and 
symbolic techniques are selectively used according to 
their strength and weakness. The learning capability, 
fault-tolerant property, and ability of simultaneous inte- 



gration of multiple signal-level sources make the connec- 
tionist techniques suitable to the phoneme recognition 
from the speech signals, but the structure manipula- 
tion and powerful matching (binding) properties of the 
symbolic techniques are the better choices for the com- 
plex morphological processing of Korean. However, the 
parallel multiple constraint relaxation capability of the 
connectionist techniques are applied together with the 
symbolic structure binding techniques for the syntactic 
processing. 2) The linguistic characteristics of Korean 
are fully considered in phoneme recognition, speech and 
language integration, and morphological/syntactic pro- 
cessing. 3) The SKOPE provides multi-level applica- 
tion program interfaces (APIs) which can utilize the 
phoneme-level or the morphological level or the syn- 
tactic level services for the applications such as spoken 
language interface, voice information retrieval and spo- 
ken language translation. 

We hope the experience of SKOPE development pro- 
vide viable answers to some of the open questions to the 
speech and language processing, such as 1) how learning 
and encoding can be synergetically combined in speech 
and language processing, 2) which aspects of system 
architecture have to be considered in spoken language 
processing, especially in conncctionist/symbolic hybrid 
systems, and finally 3) what are the most efficient way 
of speech and language integration, especially for ag- 
glutinative languages. 

Characteristics of spoken Korean 

This section briefly explains the linguistic characterists 
of spoken Korean before describing the SKOPE sys- 
tem. In this paper, Yale romanization is used for rep- 
resenting the Korean phonemes. 1) A Korean word, 
called Eojeol, consists of more than one morphemes 
with clear-cut morpheme boundaries. 2) Korean is 
a postpositional language with many kinds of noun- 
endings, verb-endings, and prefinal verb-endings. These 
functional morphemes determine the noun's case roles, 
verb's tenses, modals, and modification relations be- 
tween Eojeols. 3) Korean is a basically SOV language 
but has relatively free word order compared to the rigid 
word-order languages, such as English, except for the 
constraints that the verb must appear in a sentence- 
final position. However, in Korean, some word-order 
constraints do exist such that the auxiliary verbs repre- 
senting modalities must follow the main verb, and the 
modifiers must be placed before the word (called head) 
they modify. 4) The unit of pause in speech (which is 
called Eonjeol) may be different from that of a written 
text (an Eojeol). The spoken morphological analysis 
must deal with an Eonjeol since no Eojeol boundary 
can be provided in the speech. 5) Phonological changes 
can occur in a morpheme, between morphemes in an 
Eojeol, and even between Eojeols in an Eonjeol. These 
changes include consonant and vowel assimilation, dis- 
similation, insertion, deletion, and contraction. 6) Ko- 
rean has many rising diphthongs that are very similar to 
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Figure 1: The spoken Korean processing engine archi- 
tecture. The architecture has two-level interfaces be- 
tween modules: phoneme lattice and morpheme lattice 
for efficient and generalized speech and natural lan- 
guage integration. 



mono-vowels at signal level. Korean has well-developed 
syllable structures, and unlike Japanese that has only 
CV0 type syllable, Korean has all different types such 
as CV, VC, V, CVC. Moreover, in CVC type syllable, 
first and second consonants are almost same in pronun- 
ciation. These signal characteristics make it difficult to 
directly use phonemes or syllables as sub-word recogni- 
tion units. 

The SKOPE architecture 

The above spoken Korean characteristics and the rel- 
ative strength and weakness of symbolic/connectionist 
techniques result in the general SKOPE architecture 
which is shown in figure |l|. The architecture con- 
sists of three different but closely interrelated modules: 
phoneme recognition, morphological analysis, and syn- 
tactic analysis module. The phoneme recognition mod- 
ule processes the signal-level information, and changes 
it to the symbol- level information (phoneme lattice). 
The morphological analysis begins the primitive lan- 
guage processing, and connects the speech recognition 
to the language processing at the phoneme-level. The 
syntactic analysis module finishes the language process- 
ing^, and produces the domain independent syntactic 
structures for application systems. The following sub- 
sections briefly describe each module. 



1 C: consonant, V: vowel 

2 We believe that the semantic and pragmatic processing 
should be integrated into the domain knowledge for prac- 
tical application under the current NLP technology, so we 
excluded the semantic and pragmatic processing from our 
general model. 



diphone diphone diphone 

types numbers examples 

V 21 a, o, wu, i, u, ye, 

CIV 378 ha, sa, ka, la, ma, kha, 

VC2 147 an, am, eng, em, wun, in, 

C2C1 126 ngs, nn, ngt, ngh, .... 
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18 




a an ha sa ka la ma .... am 




33 



Figure 2: Four different Korean diphone types (V: 
vowei, CI: syllable- first consonant, C2: syllable- final 
consonant) 



Diphone-based connectionist phoneme 
recognition 

The phoneme recognition is performed by developing 
the hierarchically organized group of TDNNs (time de- 
lay neural networks) (?). Considering the signal char- 
acteristics of the Korean phonemes, we define diphones 
as a new sub- word recognition unit. The defined di- 
phones are shown in figure ||, and are classified into four 
different types. The diphones have the co-articulation 
handling features similar to the popular triphones (?) 
but are much fewer in numbers. 

Figure || shows the architecture of the component 
TDNNs in the phoneme recognition module. The whole 
module consists of total 19 different TDNNs for recog- 
nition of the defined Korean diphones. The top-level 
TDNN identifies the 18 vowel groups of diphones (we re- 
classified the total 672 diphones into 18 different groups 
according to the vowels that are contained in the di- 
phones). The 18 different sub-TDNNs recognize the 
target diphones. 

For the training of TDNNs, we manually segment the 
digitized speech into 200 msec range (which includes 
roughly left-context phoneme, target diphone, and right 
context phoneme), and perform 512 order FFTs and 
16 step mel-scaling (?) to get the filter-bank coeffi- 
cients. Each frame size is 10 msec, so 20 (frames) by 
16 (mel-scaling factor) values are fed to the TDNNs 
with the proper output symbols, that is, vowel group 
name or target diphone names. After the training of 
each TDNN, the phoneme recognition is performed by 
feeding 200 msec signals to the vowel group identifica- 
tion network and subsequently to the proper diphone 
recognition network. The 200 msec signals are shifted 
by 30 msec steps and continuously fed to the networks 
to process the continuous speech in an Eonjeol. From 
the resulting diphone sequences, the necessary phoneme 
lattice has to be constructed. We use a simple deter- 
ministic decoding heuristics and try to maintain all the 
possible diphone spotting results since the later phono- 
logical/morphological processing can safely prune the 
incorrect recognitions. The decoding begins bygroup- 
ing the diphones into the same types (see figure ||) . The 
frequency count for each diphone, that is, the number 
of specific diphones per 30 msec frame shift, is utilized 
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Figure 3: (a) The TDNN architecture for the vowel 
group identification. Note the cc group contains no 
vowels, (b) The architecture of sub-TDNNs for /a/ 
vowel group. The other 17 sub-TDNNs have the same 
architecture, but different number of output units ac- 
cording to the number of diphones in each of the vowel 
group. 



to fix the insertion errors by deleting the lower fre- 
quency count diphones, and finally the diphones are 
split into the constituent phonemes by merging the 
same phonemes in the neighboring diphones. 

Table-driven morphological and 
phonological analysis 

The morphological analysis starts with the phoneme 
lattice. The phoneme lattice delivers the alternative 
phonetic transcriptions^ of input speech, which must be 
searched by the morphological/phonological analyzer to 
reconstruct the orthographic morpheme strings. The 
conventional morphological analysis procedure (?), that 
is, morpheme segmentation, morphotactics modeling, 
and orthographic rule (or phonological rule) modeling, 
must be augmented and extended as the followings: 1) 
The conventional morpheme segmentation is extended 
to deal with the exponential number of phoneme se- 
quences and between-morpheme phonological changes 
during the segmentation, 2) the morphotactics model- 
ing is extended to cope with the complex verb and noun- 
endings (or postpositions), and 3) the orthographic rule 
modeling is combined with the phonological rule mod- 
eling to correctly transform the phonetic transcriptions 
to the orthographic morpheme sequences. 

The central part of the morphological analysis lies 
in the dictionary construction. In our dictionary, each 

3 Unlike English, the Korean alphabet is truly phonetic 
in the sense that each phoneme is pronounced as it is writ- 
ten. That is why we sometimes use phonetic and phonemic 
interchangeably. 
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Figure 4: The morpheme-level phonetic dictionary. 
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Figure 5: Morphological parsing of the phoneme lattice 
(from top: output morpheme sequence in an Eonjeol, 
triangular parsing table, input phoneme lattice). 



phonetic transcription of single morpheme has a sepa- 
rate dictionary entry. Figure || shows the unified dic- 
tionary both for speech and language processing (called 
morpheme-level phonetic dictionary) with three differ- 
ent morpheme entries ci-wu, I, swu. 

The extended morphological analysis is based on the 
well-known tabular parsing technique for context-free 
language (?) and augmented to handle the Korean 
phonological rules and phoneme-lattice input. Figure 
shows our extended table-driven morphological ana 
ysis process. The example phoneme lattice was ob- 
tained from the input speech ci-wul-sswu (removable), 
and the morphological analysis produces ci-wu+l+swu 
(remove+ADNOMINAL+BOUND-NOUN), where '+' 
is the morpheme boundary, and '-' is the syllable bound- 
ary. 

The extended morpheme segmentation is basically 
performed using the dictionary search. During the left- 



to-right scan of the input phoneme lattice, when a mor- 
pheme boundary is found in the lattice, the morpheme 
is enrolled in the triangular table in an appropriate po- 
sition. For example, in figure 0, morphemes such as 
ci-wu, I, swu, etc are enrolled in the table position 
(1,3), (4,4), (5,6), etc. The position (i,j) designates 
the starting and ending position of the enrolled mor- 
phemes. However since the input is a phoneme-lattice, 
total exponential time is required to find all the possible 
morpheme boundaries. To cope with such exponential 
explosion, the dictionary is organized as trie structure 
(?) using the phonetic transcriptions as trie indices, 
and breadth-first search of the trie can prune the un- 
necessary phoneme sequences earlier in the search. 

The morphotactics modeling is necessary after all 
the morphemes arc enrolled in the table in order to 
combine only legal morphemes into an Eojeol (Ko- 
rean word), and the process is called morpheme- 
connectivity-checking. Since Korean has well developed 
postpositions (noun-ending, verb-ending, prefinal verb- 
ending) which play as grammatical functional mor- 
phemes, we must assign each morpheme proper part-of- 
speech (POS) tags for the efficient connectivity check- 
ing. Our more than 200 POS tags which are refined 
from the 13 major Korean lexical categories are hierar- 
chically organized, and contained in the dictionary (in 
the name of morphological connectivity, see figure 0). 
In the case of idiomatic expressions, we place such id- 
ioms directly in the dictionary for efficiency, where two 
different POS tags are necessary for the left and the 
right morphological connectivity. For single morpheme, 
the left and the right POS tags are always same. The 
separate morpheme-connectivity-matrix indicates the 
legal morpheme combinations, and the morphotactics 
modeling is performed using the POS tags (in the dic- 
tionary) and morpheme-connectivity-matrix. 

The orthographic rule modeling must be integrated 
with the phonological rule modeling in spoken language 
processing. Since we must deal with the phoneme lat- 
tice, the conventional rule-based modeling requires ex- 
ponential number of rule application (?). So our so- 
lution is based on the declarative modeling of both or- 
thographic and phonological rules in uniform way. That 
is, in our dictionary, the conjugated verb forms as well 
as the original verb forms are enrolled, and the same 
morphological connectivity information is applied. The 
phonological rule modeling is also accomplished declar- 
atively by having the phonemic connectivity informa- 
tion in the dictionary. The phonemic connectivity infor- 
mation for each morpheme declares the possible phone- 
mic changes in the first (left) and last (right) positioned 
phonemes in the morpheme, and the separate phonemc- 
connectivity-matrix indicates the legal sound combina- 
tions in Korean phonology. For example, in figure g, the 
morpheme I can be combined with the morpheme swu 
during the morpheme connectivity checking even if swu 
is actually pronounced as sswu because the phoneme- 
connectivity-matrix supports the legality of the combi- 



nation of / sound with ss soundjj. In this way, we can 
declaratively model all the major Korean phonology 
rules such as second consonant standardization, con- 
sonant assimilation, palatalization, glotalization, inser- 
tion, deletion, and contraction. 

Table-driven connectionist / symbolic 
syntax analysis 

The phoneme lattice-based morphological analysis pro- 
duces the morphologically analyzed (segmented and 
stem reconstructed) morpheme sequences. Since there 
are usually more than one analysis results due to the 
errors of speech recognition process, the outputs are 
usually organized as morpheme lattice. For the seam- 
less integration of the morphological analysis with the 
syntax analysis, we employ the same table-driven con- 
trol for the syntax analysis as well as the morphological 
analysis. 

We extend the category formation and functional 
application rules in the previous categorial unification 
grammar(?; ?) to deal with the word order variations 
in Korean: 

• if category a 6 C, then a 6 C 

• if category a G C, and category set S G C, then a/S 
G C and a\S G C 

where S is an unordered set of categories. 

• left cancellation: b\{ai,a2, o„} results in 
b\{ai, d2, ■ ■ ■ , a,;_i, aj+i, . . . , a„} 

• right cancellation: b/{ai,Oa, a n } a,i results in 
b/ {oi, a2, . . . , Oj-ij Oi+i, • ■ ■ , a n } 

The syntax analysis is performed by interactive relax- 
ation (spreading activation) parsing on the categorial 
grammar where the position of the functional applica- 
tions are controlled by a triangular table. The original 
interactive relaxation parsing (?) was extended to pro- 
vide efficient constituent searching and expectation gen- 
eration through positional information provided by cat- 
egorical grammar and triangular table. Figure ^ shows 
table-driven interactive relaxation parsing. 

The interactive relaxation process consists of the fol- 
lowing three steps that are repetitively executed: 1) add 
nodes, 2) spread activation, and 3) decay. 

add nodes Grammar nodes (syntactic categories from 
the dictionary) are added for each sense of the 
morphemes when the parsing begins. A grammar 
node which has more activation than the predefined 
threshold O generates new nodes in the proper po- 
sitions (to be discussed shortly). The newly gener- 
ated nodes search for the constituents (expectations) 
which are in the appropriate table positions, and are 
of proper function applicable categories. For exam- 
ple, in figure ^, when np\np(2,2) fires, it generates 

4 This legality comes from the Korean phonology rule 
glotalization (one form of consonant dissimilation) stating 
that s sound becomes ss sound after I sound. 
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phail — tul — ul ciwu — ela 



Figure 6: Table-driven interactive relaxation parsing of 
a categorial grammar. The input sentence is phai-l-tul- 
ul ciwu- ela (delete the files). Only single morpheme 
chain and only one sense for each morpheme is shown 
as input for clear illustration. The subj, obj, coram in- 
dicate npfsubj], npfobj], s [command], respectively. The 
table contains only the nodes that participate in the 
final parse trees. 



np(l,2). The generated np(l,2) searches for the con- 
stituents np(l,l) to be combined with np\np(2,2). 

spread activation The bottom-up spreading activa- 
tion is as follows: 

2 

n x p x a x ? 

where predefined portion p of total activation a is 
passed upward to the node with activation a% among 
the n parents each with node activation aj. In other 
words, the node with large activation gets more and 
more activation, and it gives an inhibition effects 
without explicit inhibitory links (?). The top-down 
spreading activation uniformly distributes: 
p' x a 

among the children where p' is predefined portion of 
the source activation a. 

decay The node's activation is decayed with time. The 
node with less constituents than needed gets penal- 
ties plus decays: 
ax(l-d)x| 

where a is an activation value, d is a decay ratio, and 
Ca, Cr is the actual and required constituents. Af- 
ter the decay, the node with less activation than the 
predefined threshold $ is removed from the table. 

The node generation and constituent search positions 
are controlled by the triangular table. When the node 
a(i,j) acts as an argument, it generates node only in 
the position (k,j) where 1 < k < j, and the generated 
node searches for the constituents (functors) only in the 
position (k,i-l). Or when the node is generated in the 



position (i,k) where j < k < number —of —morphemes, 
it searches for the position (j+l,k) for its constituents. 
When the node acts as a functor, the same position re- 
strictions also apply for the node generation and the ar- 
gument searching. The position control combined with 
the interactive relaxation guarantees an efficient, lex- 
ically oriented, and robust syntax analysis of spoken 
languages. 

Implementation and experiments 

The SKOPE was fully implemented in UNIX/C plat- 
form, and have been extensively tested in practical do- 
mains such as natural language interface to operating 
systems. The phoneme recognition module targets 1000 
morpheme continuous speech, currently speaker depen- 
dent due to the short of standard speech database for 
Korean. The unified morpheme-level phonetic dictio- 
nary has about 1000 morpheme entries and compiled 
into the trie structure. The morpheme-connectivity- 
matrix and phoneme-connectivity-matrix are encoded 
with the special Korean POS (part-of-speech) symbols 
and compressed. 

This section demonstrates the SKOPE 's performance 
in continuous diphone recognition, morphological anal- 
ysis, and syntax analysis experiments. For the con- 
tinuous diphone recognition experiment, we generated 
about 5500 diphone patterns from the 990 Eojeol pat- 
terns (66 Eojeols, 15 times pronunciation) for the train- 
ing of TDNNs. In the performance phase, the new 
2600 test Eojeol patterns (260 Eojeol, 10 times pro- 
nunciation) are continuously shifted with 30 msec step, 
and generate 7772 test diphonepatterns disjoint from 
the training patterns. Figure |7j-a shows the continu- 
ous diphone recognition performance. The correct des- 
ignates that the correct target diphones were spotted 
in the testing position, and the delete designates the 
other case. The insert designates that the non-target 
diphones were spotted in the testing position. To com- 
pare the ability of handling the continuous speech, we 
also tested the diphone recognition using the hand- 
segmented test patterns with the same 7772 target di- 
phones. Figure M-b shows the segmented diphone recog- 
nition performance. Since the test data are already 
hand-segmented before input, there are no insertion and 
deletion errors in this case. The fact that the segmented 
speech performance is not much better than the contin- 
uous one (93.8% vs. 93.4%) demonstrates the diphone's 
suitability to handling the continuous speech. 

For the morphological analysis performance, we used 
the same 990 Eojeol patterns to train the phoneme 
recognition module, and the 2600 Eojeol patterns to 
test the morphological analysis performance directly 
from the speech input. Figure |^ shows the results. 

This experiment shows that most of the morphologi- 
cal errors are propagated from the incorrect (deleted) or 
spurious (inserted) phoneme recognition results. To see 
the original performance of the morphological and syn- 
tactic analysis modules assuming no speech recognition 
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Figure 7: (a) Continuous diphone recognition versus (b) 
segmented diphone recognition 
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Figure 8: Morphological analysis from continuous 
speech signals. The table indicates that, among the to- 
tal 9605 morphemes in 2600 Eojeol patterns, the 80.1% 
are correctly recognized and analyzed, and 19.8% can- 
not be analyzed for deletion errors. The 7182 spurious 
morphemes are also generated due to the speech inser- 
tion errors. 



error, we artificially made the phoneme lattices by mu- 
tating the correctly recognized phoneme sequences ac- 
cording to the phoneme recognizer's confusion matrix. 
Each phoneme lattice was made to contain at least one 
correct recognition result, so the phoneme recognition 
performance is assumed to be perfect except the arti- 
ficially made insertion errors (mutations). In this way, 
we made 6 or 7 lattices for each of the 50 sentences, 
altogether 330 phoneme lattices. The average phoneme 
alternatives per single correct phoneme in the lattice 
are 2.3, and average sentence length is 31 phonemes. 
This means there are average 2.3 31 phoneme chains in 
each lattice. The used sentences are natural language 
commands to UNIX (?) and are fairly complex which 
have one or two embedded sentences or conjunctions. 
Figure O shows the morphological and syntactic analy- 
sis results for these artificially made phoneme lattices. 
For the syntactic level interactive relaxation, we used 
the following parameters (which are experimentally de- 
termined): upward propagation portion p 0.05, down- 
ward propagation portion p' 0.03, decay ratio d 0.87, 
the node generation threshold 0.51, and the node 
removal threshold $ 0.066. 

The morphological analysis was perfect as shown in 
the table. Since the phoneme lattice was made to con- 
tain at least one correct phoneme recognition result, 
the morphological analysis must be perfect as long as 
the morpheme is enrolled in the dictionary and the con- 
nectivity information can cover all the morpheme com- 
binations. This was possible due to the small number 
of tested sentences (50 sentences). This results verify 
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Figure 9: The morphological and syntactic analysis 
from the artificially made phoneme lattices. 



that most of the morphological analysis errors from real 
speech input are actually propagated from the phoneme 
recognition errors as discussed before. However, the 
syntax analysis results are marginal here since we only 
count the single best scored tree, and we don't use yet 
any semantic feature in the analysis. The syntax anal- 
ysis failures mainly come from 1) the insertion errors 
(artificial mutations) in the phoneme lattices^], which 
result in ambiguous morpheme lattice, and finally pro- 
duce redundant syntax trees, and 2) the inherent struc- 
tural ambiguities in the sentence. These failures should 
be greatly reduced if we generate n-best scored parse 
trees, and let the semantic processing module select the 
correct ones as is usually done in most of the probabilis- 
tic parsing schemes(?). 



approaches to the advanced spoken language process- 
ing model, including optimizing TDNN-based phoneme 
recognition module, integrating HMM-based morpheme 
recognition module into the connectionist phoneme 
recognition, and incorporating probabilistic searches 
into the morphological analysis process as well as the 
syntactic analysis process. We are also developing ap- 
plications on top of our SKOPE, including speech-to- 
speech translation system and intelligent interface agent 
for UNIX operating system. We hope our approach 
could be extended to other agglutinative languages such 
as Japanese, Finish, and Turkish, and also to the lan- 
guages that have complex morphological phenomena 
such as German and Dutch. 
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Conclusions and future works 

This paper explains the design and implementation 
of spoken Korean processing engine, which is a con- 
nectionist/symbolic hybrid model of spoken language 
processing by utilizing the linguistic characteristics of 
Korean. The SKOPE model demonstrates the syner- 
getic integration of connectionist and symbolic tech- 
niques by considering the relative strength and weak- 
ness of two different techniques, and also demonstrates 
the phoneme level speech and language integration for 
general morphological processing for agglutinative lan- 
guages. Besides the above two major contributions, 
the SKOPE architecture has the following unique fea- 
tures in spoken language processing: 1) the diphones 
are newly developed as a sub-word recognition unit for 
connectionist Korean speech recognition, 2) the mor- 
phological and syntactic analysis are tightly coupled by 
using the uniform table-driven control, 3) the phonolog- 
ical and orthographic rules are uniformly co-modeled 
declaratively, and 4) the table-driven interactive relax- 
ation parsing and extension of the categorial grammar 
can provide robust handing of word-order variations in 
Korean. 

However, current implementation of the system still 
suffers from excessive continuous speech recognition er- 
rors. Since the large vocabulary continuous speech 
recognition is still an open problem, we cannot hope 
for the 100% correct speech recognition results in the 
near future. Currently, we are pursuing multi-strategic 



5 Recall we generated average 2.3 phonemes per single 
correct phoneme. 



